23. Some more math
This section is given as bonus material and is not mandatory. If you are curious how we derived the final accumulative equation for BPTT, this section will help you out.
In the previous videos, we talked about Backpropagation Through Time. We used a lot of partial derivatives, accumulating the contributions to the change in the error from each state. Remember?
When we needed a general scheme for the BPTT, I simply displayed the equation without giving you further explanations.
As a reminder, the following two equations were derived when adjusting the weights of matrix W_s and matrix W_x:
![_Equation 48: BPTT calculations for the purpose of adjusting Ws_](img/screen-shot-2017-11-30-at-4.40.57-pm.png)
Equation 48: BPTT calculations for the purpose of adjusting Ws
![_Equation 49: BPTT calculations for the purpose of adjusting Wx_](img/screen-shot-2017-11-30-at-4.41.08-pm.png)
Equation 49: BPTT calculations for the purpose of adjusting Wx
To generalize the case, we will avoid proving equation 48 or 49, and will focus on a general framework.
Let's look at the following sketch, presenting a portion of a network:
![](img/screen-shot-2017-12-04-at-2.04.54-pm.png)
In the picture above, we have four states, starting with s_t.
We will initially consider the three weight matrices W_1,W_2 and W_3 as three different matrices.
Using the chain rule we can derive the following three equations:
![_Equation 50 (Equation set)_](img/screen-shot-2018-01-02-at-2.27.51-pm.png)
Equation 50 (Equation set)
In Backpropagation Through Time we accumulate the contributions, therefore:
![_Equation 51_](img/screen-shot-2017-12-04-at-3.54.17-pm.png)
Equation 51
Since this network is displayed as unfolded in time, we understand that the weight matrices connecting each of the states are identical. Therefore:
W_1=W_2=W_3
Lets simply call it weight matrix W. Therefore:
W_1=W_2=W_3=W
Equation 52
From equation 52, equation 51 and the set of equations 50 we derive that:
![_Equation 52_](img/screen-shot-2017-12-04-at-11.23.49-pm.png)
Equation 52
Equation 52 summarizes the mathematical procedure of BPTT and can be simply written as:
![_Equation 53_](img/screen-shot-2017-12-04-at-11.48.08-pm.png)
Equation 53
Notice that for i=t+1, we derive the following:
![_Equation 54_](img/screen-shot-2017-12-04-at-11.51.54-pm.png)
Equation 54
With the use of the chain rule we can derive the following equation (displayed in set of equations 50).
![_Equation 55_](img/screen-shot-2017-12-04-at-11.54.48-pm.png)
Equation 55
A general derivation of the BPTT calculation can be displayed the following way:
![_Equation 55_](img/screen-shot-2017-12-05-at-12.04.21-am.png)
Equation 55